Search CORE

10 research outputs found

Constructing Hierarchical Image-tags Bimodal Representations for Word Tags Alternative Choice

Author: Feng Fangxiang
Li Ruifan
Wang Xiaojie
Publication venue
Publication date: 04/07/2013
Field of study

This paper describes our solution to the multi-modal learning challenge of ICML. This solution comprises constructing three-level representations in three consecutive stages and choosing correct tag words with a data-specific strategy. Firstly, we use typical methods to obtain level-1 representations. Each image is represented using MPEG-7 and gist descriptors with additional features released by the contest organizers. And the corresponding word tags are represented by bag-of-words model with a dictionary of 4000 words. Secondly, we learn the level-2 representations using two stacked RBMs for each modality. Thirdly, we propose a bimodal auto-encoder to learn the similarities/dissimilarities between the pairwise image-tags as level-3 representations. Finally, during the test phase, based on one observation of the dataset, we come up with a data-specific strategy to choose the correct tag words leading to a leap of an improved overall performance. Our final average accuracy on the private test set is 100%, which ranks the first place in this challenge.Comment: 6 pages, 1 figure, Presented at the Workshop on Representation Learning, ICML 201

arXiv.org e-Print Archive

CiteSeerX

Challenges in Representation Learning: A report on three machine learning contests

Author: Athanasakis Dimitris
Bengio Yoshua
Bergstra James
Carrier Pierre Luc
Chuang Zhang
Courville Aaron
Cukierski Will
Erhan Dumitru
Feng Fangxiang
Goodfellow Ian J.
Grozea Cristian
Hamner Ben
Ionescu Radu
Lee Dong-Hyun
Li Ruifan
Milakov Maxim
Mirza Mehdi
Park John
Popescu Marius
Ramaiah Chetan
Romaszko Lukasz
Shawe-Taylor John
Tang Yichuan
Thaler David
Wang Xiaojie
Xie Jingjing
Xu Bing
Zhou Yingbo
Publication venue
Publication date: 01/01/2013
Field of study

The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.Comment: 8 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

Obtaining Cross Modal Similarity Metric with Deep Neural Architecture

Author: Bohan Li
Fangxiang Feng
Peng Lu
Ruifan Li
Xiaojie Wang
Publication venue: Hindawi Limited
Publication date: 01/01/2015
Field of study

Analyzing complex system with multimodal data, such as image and text, has recently received tremendous attention. Modeling the relationship between different modalities is the key to address this problem. Motivated by recent successful applications of deep neural learning in unimodal data, in this paper, we propose a computational deep neural architecture, bimodal deep architecture (BDA) for measuring the similarity between different modalities. Our proposed BDA architecture has three closely related consecutive components. For image and text modalities, the first component can be constructed using some popular feature extraction methods in their individual modalities. The second component has two types of stacked restricted Boltzmann machines (RBMs). Specifically, for image modality a binary-binary RBM is stacked over a Gaussian-binary RBM; for text modality a binary-binary RBM is stacked over a replicated softmax RBM. In the third component, we come up with a variant autoencoder with a predefined loss function for discriminatively learning the regularity between different modalities. We show experimentally the effectiveness of our approach to the task of classifying image tags on public available datasets

Crossref

Directory of Open Access Journals